Learning to Search with MCTSnets

نویسندگان

Arthur Guez

Théophane Weber

Ioannis Antonoglou

Karen Simonyan

Oriol Vinyals

Daan Wierstra

Rémi Munos

David Silver

چکیده

Planning problems are among the most important and well-studied problems in artificial intelligence. They are most typically solved by tree search algorithms that simulate ahead into the future, evaluate future states, and back-up those evaluations to the root of a search tree. Among these algorithms, Monte-Carlo tree search (MCTS) is one of the most general, powerful and widely used. A typical implementation of MCTS uses cleverly designed rules, optimised to the particular characteristics of the domain. These rules control where the simulation traverses, what to evaluate in the states that are reached, and how to back-up those evaluations. In this paper we instead learn where, what and how to search. Our architecture, which we call an MCTSnet, incorporates simulation-based search inside a neural network, by expanding, evaluating and backing-up a vector embedding. The parameters of the network are trained end-to-end using gradient-based optimisation. When applied to small searches in the well-known planning problem Sokoban, the learned search algorithm significantly outperformed MCTS baselines.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Hybrid Unconscious Search Algorithm for Mixed-model Assembly Line Balancing Problem with SDST, Parallel Workstation and Learning Effect

Due to the variety of products, simultaneous production of different models has an important role in production systems. Moreover, considering the realistic constraints in designing production lines attracted a lot of attentions in recent researches. Since the assembly line balancing problem is NP-hard, efficient methods are needed to solve this kind of problems. In this study, a new hybrid met...

متن کامل

Bilateral Teleoperation Systems Using Backtracking Search optimization Algorithm Based Iterative Learning Control

This paper deals with the application of Iterative Learning Control (ILC) to further improve the performance of teleoperation systems based on Smith predictor. The goal is to achieve robust stability and optimal transparency for these systems. The proposed control structure make the slave manipulator follow the master in spite of uncertainties in time delay in communication channel and model pa...

متن کامل

Multi-objective Differential Evolution for the Flow shop Scheduling Problem with a Modified Learning Effect

This paper proposes an effective multi-objective differential evolution algorithm (MDES) to solve a permutation flow shop scheduling problem (PFSSP) with modified Dejong's learning effect. The proposed algorithm combines the basic differential evolution (DE) with local search and borrows the selection operator from NSGA-II to improve the general performance. First the problem is encoded with a...

متن کامل

Lot Streaming in No-wait Multi Product Flowshop Considering Sequence Dependent Setup Times and Position Based Learning Factors

This paper considers a no-wait multi product flowshop scheduling problem with sequence dependent setup times. Lot streaming divide the lots of products into portions called sublots in order to reduce the lead times and work-in-process, and increase the machine utilization rates. The objective is to minimize the makespan. To clarify the system, mathematical model of the problem is presented. Sin...

متن کامل

Optimizing the Grade Classification Model of Mineralized Zones Using a Learning Method Based on Harmony Search Algorithm

The classification of mineralized areas into different groups based on mineral grade and prospectivity is a practical problem in the area of optimal risk, time, and cost management of exploration projects. The purpose of this paper was to present a new approach for optimizing the grade classification model of an orebody. That is to say, through hybridizing machine learning with a metaheuristic ...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

CoRR

دوره abs/1802.04697 شماره

صفحات -

تاریخ انتشار 2018

Learning to Search with MCTSnets

نویسندگان

چکیده

منابع مشابه

A Hybrid Unconscious Search Algorithm for Mixed-model Assembly Line Balancing Problem with SDST, Parallel Workstation and Learning Effect

Bilateral Teleoperation Systems Using Backtracking Search optimization Algorithm Based Iterative Learning Control

Multi-objective Differential Evolution for the Flow shop Scheduling Problem with a Modified Learning Effect

Lot Streaming in No-wait Multi Product Flowshop Considering Sequence Dependent Setup Times and Position Based Learning Factors

Optimizing the Grade Classification Model of Mineralized Zones Using a Learning Method Based on Harmony Search Algorithm

عنوان ژورنال:

اشتراک گذاری